Data processing systems

ABSTRACT

Operating a data processing system including producing data in the form of plural blocks of data, where each block of data represents a particular region of an output data array, storing the data in a memory of the data processing system, and reading the data from the memory in the form of lines. Storing the data in the memory comprises storing each block of data of a first row of blocks of data in the memory at one or more memory addresses of a first set of memory addresses of a sequence of memory addresses for the memory, and storing each block of data of a second row of blocks of data in the memory at one or more memory addresses of a second set of different memory addresses of the sequence of memory addresses for the memory.

BACKGROUND

The technology described herein relates to data processing systems, andin particular to the processing of data when generating an image fordisplay on a display in a data processing system.

In data processing systems, it can often be the case that data isgenerated or otherwise provided in a format that is different to aformat that is subsequently required. This may be the case, for example,when processing an image for display. In this case, data in respect ofan image to be displayed may be generated in the form of pluraltwo-dimensional blocks (arrays) of data positions (e.g. “tiles”), butmay be further processed and/or provided to a display (such as a displaypanel) in the form of plural one-dimensional lines of data positions(e.g. raster lines).

One exemplary such arrangement is in a display controller, where inputblocks (arrays) of data may be used (consumed) in the form of rasterlines (e.g. when generating output frames for display on a display).

In such arrangements, the data processing system must effectivelyconvert from one format to the other. This can be achieved using aso-called “de-tiler”, where the data is written in the form of pluralblocks of data to a buffer, and is then read out from the buffer in theform of lines of data.

The Applicants believe that there remains scope for improvements to dataprocessing systems that operate in this manner.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments of the technology described herein will now bedescribed by way of example only and with reference to the accompanyingdrawings, in which:

FIG. 1 shows schematically a data processing system in accordance withan embodiment of the technology described herein;

FIG. 2 shows schematically a display controller in accordance with anembodiment of the technology described herein;

FIG. 3 shows schematically a portion of a display controller inaccordance with an embodiment of the technology described herein;

FIG. 4 shows schematically the operation of a display controller inaccordance with an embodiment of the technology described herein;

FIG. 5 shows schematically the operation of a display controller inaccordance with an embodiment of the technology described herein;

FIG. 6 shows schematically the operation of a display controller inaccordance with another embodiment of the technology described herein;

FIG. 7 shows schematically the operation of a display controller inaccordance with an embodiment of the technology described herein;

FIG. 8 shows schematically the operation of a display controller inaccordance with an embodiment of the technology described herein; and

FIG. 9 shows schematically the operation of a display controller inaccordance with an embodiment of the technology described herein.

Like reference numerals are used for like components throughout thedrawings, where appropriate.

DETAILED DESCRIPTION

A first embodiment of the technology described herein comprises a methodof operating a data processing system comprising:

producing data in the form of blocks of data, where each block of datarepresents a particular region of an output data array;

storing the data in a memory of the data processing system; and

reading the data from the memory in the form of lines;

wherein storing the data in the memory comprises:

storing each block of data of a first row of blocks of data in thememory at one or more memory addresses of a first set of memoryaddresses of a sequence of memory addresses for the memory; and

storing each block of data of a second row of blocks of data in thememory at one or more memory addresses of a second set of differentmemory addresses of the sequence of memory addresses for the memory;

wherein at least some of the memory addresses of the second set ofmemory addresses fall between memory addresses of the first set ofmemory addresses in the sequence of memory addresses for the memory.

A second embodiment of the technology described herein comprises a dataprocessing system comprising:

a first processing stage operable to produce data in the form of pluralblocks of data, where each block of data represents a particular regionof an output data array;

a second processing stage operable to read the data from the memory inthe form of lines; and

a memory;

wherein the data processing system is configured to store the data inthe memory by:

storing each block of data of a first row of blocks of data in thememory one or more memory addresses of a first set of memory addressesof a sequence of memory addresses for the memory; and

storing each block of data of a second row of blocks of data in thememory at one or more memory addresses of a second set of differentmemory addresses of the sequence of memory addresses for the memory;

wherein at least some of the memory addresses of the second set ofmemory addresses fall between memory addresses of the first set ofmemory addresses in the sequence of memory addresses for the memory.

The technology described herein is concerned with a method of operatinga data processing system in which data is produced in the form of pluralblocks of data that each represent a particular region of an output dataarray, and is then read in the form of lines.

In the technology described herein, blocks of data of a first row and asecond row of blocks of data, e.g. of the output data array, are storedin a memory of the data processing system at respective memory addressesof a sequence of memory addresses for the memory.

However, at least some of the memory addresses at which data blocks ofthe second row are stored fall between memory addresses at which datablocks of the first row are stored in the sequence of memory addressesfor the memory. In other words, rather than, e.g., storing the blocks ofdata of each row in the memory at contiguous memory addresses in thesequence of memory addresses, the first and second rows of blocks ofdata are instead stored in the memory such that their respective blocksof data are “mixed in”, e.g. interleaved, together in the sequence ofmemory addresses for the memory.

As will be described in more detail below, this arrangement canfacilitate more efficient and improved operation of the data processingsystem.

In particular, storing plural rows of blocks of data in the memory, e.g.whenever this is possible based on the size of the data block rows, hasthe effect of increasing the latency tolerance of the system. This isbecause, in de-tiler arrangements, where data is produced in the form ofblocks of data and is then read in the form of lines, a full line width(i.e. a row) of blocks of data of the output array should be producedbefore the data can be read in the form of lines. This could lead to abottleneck in the data processing system, where an entire row of blocksof data must be produced and stored in the memory before the data can beread from the memory. Accordingly, storing plural rows of blocks of datain the memory allows the data processing system to begin reading thedata (in the form of lines) when the memory is only partially filledwith data, and at the same time as (further) data is being produced andstored in the memory.

This also means that, where it is necessary for the data processingsystem to fetch data to be used to produce the blocks of data, e.g. byissuing read requests to an external memory via an interconnect or busof the overall data processing system, the read requests in respect ofplural rows of data blocks can be grouped together (rather than the readrequests for each row being issued separately), thereby increasing theamount of silence on the bus (e.g. for use by other elements of the dataprocessing system), and facilitating more efficient operation of thedata processing system.

Moreover, storing plural rows of blocks of data in the memory such thattheir respective blocks of data are “mixed in” together, e.g. accordingto a particular pattern that will be described in more detail below, canallow the memory that is provided for storing the data to be utilisedmore efficiently, and can facilitate more efficient reading of the datain the form of lines by the data processing system.

It will be appreciated, therefore, that the technology described hereinprovides an improved data processing system and method of operating adata processing system.

The data processing system of the technology described herein isoperable to produce data in the form of blocks of data, where each blockof data represents a particular region (area) of an output data array.

The output data array in an embodiment comprises an array of plural datapositions, with each data position taking a particular data (e.g.colour) value. In an embodiment, the data comprises image data, i.e. oneor more arrays of image (colour) data, e.g. one or more frames, fordisplay or otherwise.

In an embodiment, the data is produced in the form of (comprises) pluraldata words (e.g. such as plural Advance eXtensible Interface (AXI)words), where each word in an embodiment includes data in respect ofmultiple data positions of the data array.

Each block of data represents a particular region of the output dataarray. Each block of data should, and in an embodiment does, comprise atleast two rows and at least two columns of data positions of the dataarray.

Thus, in an embodiment, the or each array of data produced by the dataprocessing system is divided or partitioned into a plurality ofidentifiable smaller regions each representing a part of the overallarray, and that can accordingly be represented as blocks of data.

The sub division of the or each array into blocks of data can be done asdesired, and each block of data can represent any suitable and desiredregion (area) of the overall array of data.

Each block of data in an embodiment represents a different part(sub-region) of the overall array (although the blocks could overlap ifdesired). Each block should represent an appropriate portion (area) ofthe array (plurality of data positions within the array).

In an embodiment, the or each array of data produced by the dataprocessing system is divided into regularly sized and shaped regions(blocks of data), in an embodiment in the form of squares or rectangles.Suitable data block sizes would be, e.g., 8×8, 16×8, 16×16, 32×4, 32×8,or 32×32 data positions in the data array. Other arrangements would, ofcourse, be possible.

Thus, in an embodiment the or each array of data is divided into anarray of regularly sized and shaped regions (blocks of data), e.g. suchthat the or each array comprises plural rows of blocks of data (e.g.that in an embodiment include the first and second rows of blocks ofdata of the technology described herein) and plural columns of blocks ofdata.

Each row of blocks of data should, and in an embodiment does, comprise arow of blocks of data that is one block high and many blocks wide(long). Each row of blocks of data in an embodiment has a widthcorresponding to (equal to) the width of the overall array of dataproduced by the (first processing stage of the) data processing system.Each row of blocks of data in an embodiment has a height correspondingto (equal to) the height of a single block of data produced by the(first processing stage of the) data processing system.

Correspondingly, each column of blocks of data should, and in anembodiment does, comprise a column of blocks of data that is one blockwide and many blocks high. Each column of blocks of data in anembodiment has a height corresponding to (equal to) the height of theoverall array of data produced by the (first processing stage of the)data processing system. Each column of blocks of data in an embodimenthas a width corresponding to (equal to) the width of a single block ofdata produced by the (first processing stage of the) data processingsystem.

In an embodiment, each data block produced by the data processing systemcorresponds to a “tile”, e.g. that a (first) processing stage of thedata processing system produces as its output.

(In tile-based data processing systems, the two dimensional output array(e.g. frame) of the data processing system is sub-divided or partitionedinto a plurality of smaller regions, usually referred to as “tiles”, forthe data processing. The tiles (sub-regions) may each be processedseparately (e.g. one after another or in parallel). The tiles(sub-regions) may be recombined, if desired, to provide the completeoutput array (frame), e.g. for display.

Other terms that are commonly used for “tiling” and “tile based”processing include “chunking” (the sub-regions are referred to as“chunks”) and “bucket” data processing. The terms “tile” and “tiling”will be used herein for convenience, but it should be understood thatthese terms are intended to encompass all alternative and equivalentterms and techniques.)

Where the data is produced as data words, each block of data in anembodiment comprises plural data words (e.g. plural AXI words), witheach word in an embodiment including data in respect of multiple datapositions of the block. Each of the blocks of data in an embodimentcomprises the same number of data words, e.g. N data words.

The data processing system of the technology described herein mayproduce data in the form of blocks of data in any suitable manner. Theblocks of data are in an embodiment produced in a block by block manner,i.e. from one block to the next, in an embodiment in raster line order(i.e. where the blocks of one row are produced in order, followed by theblocks of the next row, etc.), e.g. across the entire output data array.

The data processing system may comprise a first processing stage that isoperable to produce data in the form of blocks of data. The firstprocessing stage may comprise, for example, a decoder, a rotation stage,a graphics processing unit (GPU), a central processing unit (CPU), avideo codec, a compositor, etc.

There may be a single first processing stage or there may be pluralfirst processing stages operable to produce data in the form of blocksof data. Where there are plural first processing stages, then each firstprocessing stage is in an embodiment operated in the manner of the firstprocessing stage described above.

The data may be produced by the first processing stage generating thedata, e.g. by generating the data itself, and/or by reading or receivingdata from elsewhere (such as from memory or one or more other processingstages of the data processing system), and then processing (e.g.modifying) that data.

In embodiments where data is read from memory, the memory may compriseany suitable memory and may be configured in any suitable and desiredmanner. For example, it may be a memory that is on chip with and/orlocal to the processing stage in question or it may be an externalmemory. In an embodiment it is an external memory, such as a main memoryof the data processing system. It may be dedicated memory for thispurpose or it may be part of a memory that is used for other data aswell. In an embodiment the data is read from (and stored in) a framebuffer.

Correspondingly, in embodiments where data is read from memory, the dataprocessing system and/or the first processing stage may comprise a readcontroller, such as a Direct Memory Access (DMA) read controlleroperable to read data from the memory.

The data processing system and/or the first processing stage and/or theread controller may be configured to issue read requests to the(external) memory, e.g. in respect of the data required to produce eachblock of data. In this case, read requests in respect of the blocks ofdata of (at least) the first and second rows of blocks of data are in anembodiment grouped together (clustered or bundled), i.e. are in anembodiment issued at the same time or time period (e.g. substantiallycontinuously, one after another).

In an embodiment, the (first processing stage of the) data processingsystem comprises a decoder, in an embodiment an ARM Frame BufferCompression (AFBC) (or other block-based encoding scheme) decoder (AFBCis described in US A1 2013/0034309), which is operable to decode(decompress) data such as one or more received (in an embodiment AFBC)encoded (compressed) blocks of data, which are, e.g., read from memory.Accordingly, in this embodiment, the (first processing stage of the)data processing system comprises a (AFBC) decoder that decodes and/ordecompresses (blocks of) (AFBC) encoded data to produce decoded and/ordecompressed (e.g. colour) data (e.g. blocks of decoded and/ordecompressed data).

In another embodiment, the (first processing stage of the) dataprocessing system comprises a rotation stage which is operable to rotatedata such as one or more received blocks of data, which are, e.g., readfrom memory. Accordingly, in this embodiment, the (first processingstage of the) data processing system comprises a rotation stage thatrotates (blocks of) (e.g. colour) data to produce rotated data (e.g.blocks of rotated data).

The data produced by the data processing system is stored in a memory ofthe data processing system. The (first processing stage of the) dataprocessing system in an embodiment writes the data to the memory, e.g.for reading in the form of lines.

Each block of data is in an embodiment stored in the memory when it isproduced. Accordingly, where as described above, the blocks of data areproduced in a block by block manner, then the blocks of data are in anembodiment stored in the memory in a block by block manner, i.e. fromone block to the next, in an embodiment in raster line (row by row)order.

The memory is in an embodiment a (local) buffer memory, of the dataprocessing system. Thus, the data should be (and is in an embodiment)not (is other than) written out (e.g. to external memory or otherwise)from the processor in question (e.g. display controller), but in anembodiment instead remains internal to the processor in question (e.g.display controller), when it is stored in (and read from) the memory.

In an embodiment the memory forms part of a “de-tiler”, e.g. of adisplay controller, operable to convert data received in the form of oneor more blocks (tiles) to data in the form of lines, e.g. for furtherprocessing and/or display. Thus, in an embodiment, the data processingsystem comprises a de-tiler, and the memory comprises a buffer memory ofthe de-tiler.

The memory should (and in an embodiment does) have a particular size,i.e. a total number of memory locations, e.g. for storing the produceddata. Each of the memory locations in an embodiment has its own memoryaddress. Each of the memory locations may be configured to store asingle data word, but other arrangements would be possible.

It will be appreciated that since, in de-tiler arrangements, data isproduced in the form of blocks (tiles) and then read in the form oflines, an appropriate (e.g. a full) line width of blocks (tiles) shouldbe (and in an embodiment is) produced and stored in the memory beforethe set of data can be read “independently”, i.e. before the data can beread in lines without requiring the production of further data while theset of data is being read. That is, at least one row of blocks of datashould be (and in an embodiment is) produced and stored, where the rowof blocks of data has a length (width) equal to the length (width) ofeach line of the overall data array.

Accordingly, the memory should be (and in an embodiment is) able tostore at least a full line width (a full row) of blocks of data, e.g. inrespect of the maximum (horizontal) output data array size (resolution)for which the data processing system is configured to produce (support).Thus, the memory in an embodiment has a size (i.e. a total number ofmemory addresses for storing the produced data) that is sufficient tostore at least one (full) row of blocks of data for the maximum(horizontal) output data array size (resolution) that the dataprocessing system is configured to produce.

In an embodiment, the memory is able to store only one full line width(one row) of blocks of data, e.g. in respect of the maximum (horizontal)output data array size (resolution) for which the data processing systemis configured to produce (support) (it should be noted here that, aswill be described in more detail below, in these embodiments, each ofthe first and second rows of data blocks that are stored in the memoryin an embodiment has a size that is less than or equal to half of themaximum size (resolution) that the data processing system is configuredto produce). That is, the memory in an embodiment has a size (i.e. atotal number of memory addresses for storing the data) that issufficient to store only (i.e. that is not (is other than) larger thanis necessary to store) one (full) row of blocks of data for the maximumoutput data array size (resolution) that the data processing system isconfigured to produce. This beneficial means that the size of the memorythat is provided for storing the data is minimised (or at leastreduced).

The maximum (horizontal) output data array size (resolution) that thedata processing system is configured to produce (support) may beselected as desired. It may be, for example, 8K, 4K, or HD, but anyother arrangement would also be possible.

The manner in which the data is stored in the memory will be describedin more detail below.

In the technology described herein, the data produced in the form ofblocks is read in the form of lines, e.g. of the output data array. Thedata is in an embodiment read in raster (line) order, i.e. from line toline.

Thus, where the or each array of data produced by the (first processingstage of the) data processing system comprises an array of plural datapositions, the array of data is in an embodiment read from data positionto data position in raster line order. Correspondingly, where, asdiscussed above, the data comprises plural data words, the data wordsare in an embodiment used in raster (line) order, i.e. from word to wordin raster line order.

Each line of data should, and in an embodiment does, comprise a row ofdata positions of the data array that is one data position high and manydata positions wide (long). Each line in an embodiment has a widthcorresponding to (equal to) the width of the overall array of dataproduced by the (first processing stage of the) data processing system.Each line is in an embodiment read in full, but it would also bepossible to only read a fraction of one or more or each line, e.g. halfa line, etc.

Thus, in an embodiment, the data processing system is configured toproduce regions in the form of two-dimensional blocks (arrays) of datapositions (i.e. “tiles”) of an output array (i.e. regions whose heightand width are each greater than a single data position), to write thoseregions (blocks of data) to the memory, and to read the data array fromthe memory in raster order, i.e. in the form of lines (rows of datapositions that are one data position high and many data positions wide(long)).

The data processing system of the technology described herein may beconfigured to read the data from the memory in the form of lines in anysuitable manner.

The data processing system may comprise a second processing stageoperable to read the data in the form of lines. The second processingstage may comprise or may form part of, e.g., a de-tiler (e.g. of adisplay controller), e.g. that is operable to convert data in the formof one or more blocks of data (tiles) to data in the form of one or morelines of data, e.g. for further processing and/or display.

There may be a single second processing stage or there may be pluralsecond processing stages. Where there are plural second processingstages, then each second processing stage is in an embodiment operatedin the manner of the second processing stage described above.

The manner in which the data is read in the form of lines will bedescribed in more detail below.

In an embodiment, the data processing system is operable to process theread data.

In these embodiments, where, as discussed above, the data comprisesplural data words, the data processing system in an embodiment reads thedata words in raster (line) order, i.e. from word to word in raster lineorder, and then processes each word.

In an embodiment, the data processing system comprises a (pixel)unpacking processing stage operable to extract data in respect of eachof plural individual data positions (pixels) from each of plural datawords (i.e. where each data word comprises data in respect of multipledata positions of the overall data array).

The data in respect of each data position (pixel) may then be processedfurther. For example, in an embodiment, the data processing system mayalso comprise one or more layer or pixel processing pipelines operableto process the data in respect of each data position (pixel) asappropriate, e.g. for display.

Accordingly, in these embodiments, the (first processing stage of the)data processing system in an embodiment produces data in the form ofplural data words (where each data word comprises data in respect ofmultiple data positions of the overall data array), the data is read ona word by word basis, and is then used (processed) on a data position bydata position (e.g. pixel by pixel) basis.

The data processing system may also or instead be operable to cause atleast some of the data and/or at least some of the processed data to bedisplayed. In an embodiment, the (display controller of the) dataprocessing system is operable to provide the data (directly) to adisplay for display.

To facilitate this, the data processing system in an embodimentcomprises an output stage operable to provide an image for display to adisplay. This output stage may be any suitable such output stageoperable to provide an image for display to a display, e.g. to cause animage for display to be displayed on a display (to act as a displayinterface). The output stage in an embodiment comprises appropriatetiming control functionality (e.g. it is configured to send pixel datato the display with appropriate horizontal and vertical blankingperiods) for the display.

In an embodiment, the method of the technology described herein is(only) performed (and the data processing system is configured tooperate in the manner of the technology described herein) when the(horizontal) size (resolution) of the output data array is less than orequal to half of the maximum (horizontal) output data array size(resolution) that the data processing system is configured to produce(support).

Where, as described above, the memory is able to store only one fullline width (one row) of blocks of data of the maximum output(horizontal) data array size (resolution) that the data processingsystem is configured to produce, then this will accordingly mean thatthe combined size of the first and the second rows of data is less thanor equal to the size of the memory. Correspondingly, each of the firstand the second row of data blocks will have a size that is less than orequal to half the size of the memory.

Thus, according to an embodiment, the method of the technology describedherein comprises (and the data processing system is configured to storethe data in the memory by):

when each row of data blocks has a size that is less than or equal tohalf the size of the memory, storing each block of data of a set ofplural rows of blocks of data (i.e. at least the first and second rowsof blocks of data) in the memory, e.g. in the manner of the technologydescribed herein.

On the other hand, when the (horizontal) size (resolution) of the outputdata array is greater than half of the maximum (horizontal) output dataarray size (resolution) that the data processing system is configured toproduce (i.e. when each row of data blocks of the output data array hasa size that is greater than half the size of the memory), the method inan embodiment comprises (and the data processing system is configured tostore the data in the memory by) storing each block of data of a(single) row of blocks of data in the memory. In this case, the(adjacent) data blocks of each row of data blocks are in an embodimentstored at adjacent memory addresses in the sequence of memory addresses.

In these embodiments, it would be possible to only ever store two rowsof blocks of data in the memory, i.e. whenever it is possible to do so(e.g. since each row of data blocks of the output data array has a sizethat is less than or equal to half the size of the memory) (and in oneembodiment this is done), and to store one row or data blocks wheneverthis is not the case. However, in an embodiment, one or more other rowsof blocks of data, e.g. of the output data array, may also be stored inthe memory together with the first and second rows, e.g. and in anembodiment, when it is possible to do so.

In an embodiment, the data processing system is operable to select thenumber of rows of data blocks of the data array to store in the memory,in an embodiment depending on the (horizontal) size of the data array(i.e. depending on the size of each row of data blocks), and in anembodiment to then store that number of rows of blocks in the memory.

The number of rows of blocks is in an embodiment selected depending onhow many rows of the data array it is possible to store in the memory.

Thus, in an embodiment, where the (horizontal) size (resolution) of theoutput data array is less than or equal to a particular fraction (1/n)of the maximum (horizontal) output data array size (resolution) that thedata processing system is configured to produce (support) (i.e. wheneach row of data blocks of the output data array has a size that is lessthan or equal to a particular fraction (1/n) of the size of the memory),then each block of data of n rows (first to nth rows) of blocks of datais stored in the memory, in an embodiment respectively at one or morememory addresses of n (first to nth) sets of different memory addressesof the sequence of memory addresses for the memory.

Correspondingly, where the (horizontal) size (resolution) of the outputdata array is greater than a particular fraction (1/n) of the maximum(horizontal) output data array size (resolution) that the dataprocessing system is configured to produce (support) (i.e. when each rowof data blocks of the output data array has a size that is greater thana particular fraction (1/n) of the size of the memory), then each blockof data of less than n rows of blocks of data is in an embodiment storedin the memory.

In these embodiments, the number (n) should be (and in an embodiment is)a positive integer, but may otherwise be selected as desired. Forexample, according to various embodiments n=1, 2, 3, 4, 5, 6, 7, 8, etc.

According to an embodiment, n is a power of two, e.g. n=1, 2, 4, 8, 16,etc. Constraining n to be a power of two can simplify the operation ofthe data processing system.

According to an embodiment, n is constrained to be one of a set ofpossible values. For example, and in one embodiment, n may beconstrained to be one of 1, 2, 4 and 8. Again, constraining n in thismanner can simplify the operation of the data processing system.

Thus, in an embodiment, the method comprises (and the data processingsystem is configured to store the data in the memory by):

when the (horizontal) size (resolution) of the output data array isgreater than half of the maximum (horizontal) output data array size(resolution) that the data processing system is configured to produce(support) (i.e. when each row of data blocks of the output data arrayhas a size that is greater than half the size of the memory):

storing each block of data of one (a first) row of blocks of data in thememory in an embodiment at one or more memory addresses of the sequenceof memory addresses for the memory.

Similarly, in an embodiment, the method comprises (and the dataprocessing system is configured to store the data in the memory by):

when the (horizontal) size (resolution) of the output data array is lessthan or equal to half of the maximum (horizontal) output data array size(resolution) that the data processing system is configured to produce(support) (i.e. when each row of data blocks of the output data arrayhas a size that is less than or equal to half of the size of thememory); and when the (horizontal) size (resolution) of the output dataarray is greater than a quarter of the maximum (horizontal) output dataarray size (resolution) that the data processing system is configured toproduce (support) (i.e. when each row of data blocks of the output dataarray has a size that is greater than a quarter of the size of thememory):

storing each block of data of two (a first and a second) rows of blocksof data in the memory in an embodiment respectively at one or morememory addresses of two (first and second) sets of different memoryaddresses of the sequence of memory addresses for the memory.

Similarly, in an embodiment, the method comprises (and the dataprocessing system is configured to store the data in the memory by):

when the (horizontal) size (resolution) of the output data array is lessthan or equal to a quarter of the maximum (horizontal) output data arraysize (resolution) that the data processing system is configured toproduce (support) (i.e. when each row of data blocks of the output dataarray has a size that is less than or equal to a quarter of the size ofthe memory); and when the (horizontal) size (resolution) of the outputdata array is greater than an eighth of the maximum (horizontal) outputdata array size (resolution) that the data processing system isconfigured to produce (support) (i.e. when each row of data blocks ofthe output data array has a size that is greater than an eighth of thesize of the memory):

storing each block of data of four (first to fourth) rows of blocks ofdata in the memory in an embodiment respectively at one or more memoryaddresses of four (first to fourth) sets of different memory addressesof the sequence of memory addresses for the memory.

Similarly, the method in an embodiment comprises (and the dataprocessing system is configured to store the data in the memory by):

when the (horizontal) size (resolution) of the output data array is lessthan or equal to an eighth of the maximum (horizontal) output data arraysize (resolution) that the data processing system is configured toproduce (support) (i.e. when each row of data blocks of the output dataarray has a size that is less than or equal to an eighth of the size ofthe memory):

storing each block of data of eight (first to eighth) row of blocks ofdata in the memory in an embodiment respectively at one or more memoryaddresses of eight (first to eighth) sets of different memory addressesof the sequence of memory addresses for the memory.

In the technology described herein, each block of data of plural rows ofblocks of data is stored in the memory. Each row of data blocks shouldbe (and in an embodiment is) a respective (full) row of data blocks ofthe output data array.

The first and second rows of data blocks of the technology describedherein may each comprise any one of the plural rows of data blocks ofthe output array. For example, the first and second rows of data blocksmay respectively comprise the first and second rows of data blocks ofthe data array to be produced (i.e. the “top” row and the next row), orthey may comprise any other of the rows of data blocks of the outputarray.

The plural (e.g. first and second) rows of blocks of data are in anembodiment stored in the memory such that they are (eventually) presentin the memory together, i.e. at the same time.

The plural (e.g. first and second) rows of data blocks that are storedin the memory are in an embodiment adjacent rows of data blocks of theoutput data array. Thus, in an embodiment, the blocks of data of pluraladjacent rows of blocks of data of the overall output data array arestored in the memory together.

Where as described above, the blocks of data are produced in a block byblock manner, i.e. from one block to the next (in raster line order),then each of the blocks of the first row will be (and in an embodimentare) produced and stored in the memory, and then each of the blocks ofthe second row will be (and in an embodiment are) produced and stored inthe memory (and optionally then the next row, and so on).

In the technology described herein, each block of data of the plural(i.e. at least the first and the second) rows of blocks of data isstored in the memory at one or more particular memory addresses of asequence of memory addresses for the memory.

The sequence of memory addresses for the memory can have any suitableand desired form. For example, the sequence of memory addresses may havea regular, in an embodiment linear, form, e.g. where the sequence ofmemory addresses increases monotonically. The sequence of memoryaddresses may, for example, begin with the first memory address for thememory and end with the last memory address for the memory.

Indeed, in an embodiment, the sequence of memory addresses initially hassuch a form, e.g. where the memory is empty and/or for storing the firstrow(s) of data of an array of data to be produced. (Although this neednot be the case.)

It would also be possible for the sequence of memory addresses to befixed, i.e. unchanging.

However, according to an embodiment, the sequence of memory addresses isnot (always) regular or fixed in this manner, but is in an embodimentinstead (at least in part) irregular, and, e.g., can change. In anembodiment, the sequence of memory addresses comprises a sequence ofvirtual memory addresses.

In this regard, the Applicants have recognised that the process ofreading the data in the form of lines effectively frees up space withinthe memory that can be used to store additional data. In particular,when an amount of data has been read by the data processing system (inthe form of lines) that corresponds to a block of data (e.g. when N datawords have been read by the data processing system in the form oflines), a new block of data could be stored in the memory, i.e. usingthe “freed up” memory addresses.

Storing new blocks of data using the memory addresses “freed up” by thereading process has the effect of reducing latency in the system (inparticular where, as described above, the memory of the technologydescribed herein is able to store only one full line width (i.e. onerow) of blocks of data for the maximum (horizontal) output data arraysize (resolution) that the data processing system is configured toproduce), since blocks of data can be produced and stored in the memorysubstantially at the same time as data is being read from the memory inthe form of lines.

However, since the form in which the data is read (i.e. lines) isdifferent to the form in which it is produced (i.e. blocks), thesequence of memory addresses will not typically comprise a “linear” orfixed sequence of memory addresses, but will instead comprises a morecomplex sequence of memory addresses that, e.g., depends on therelationship between the blocks of data and the lines of data.

Thus, according to an embodiment, the data processing system isconfigured to read data from the memory in the form of lines by readingdata (e.g. data words) from a read sequence of memory addresses for thememory, e.g. where the read sequence of memory addresses is in anembodiment arranged to appropriately take into account the relationshipbetween the blocks of data and the lines of data, and the sequence ofmemory addresses of the technology described herein in an embodimentcorresponds (at least in part) to (is at least in part the same as) thisread sequence of memory addresses.

Thus, the sequence of memory addresses in an embodiment corresponds (atleast in part) to a read sequence of memory addresses for reading datafrom the memory in the form of lines, in an embodiment for reading oneor more other, in an embodiment previously produced, rows of data blocks(e.g. of the data array) in the form of lines.

In an embodiment, the sequence of memory addresses of the technologydescribed herein initially comprises a regular, in an embodiment linear,sequence (e.g. where the sequence of memory addresses increasesmonotonically), e.g. for storing the first set of plural rows of datablocks of the data array, and then comprises a read sequence of memoryaddresses (for reading data from the memory in the form of lines), e.g.for storing subsequent sets of plural rows of data blocks of the dataarray.

The read sequence of memory addresses will be described in more detailbelow.

Each of the blocks of data of each row of blocks of data is stored atone or more memory addresses of a respective set of memory addresses ofthe sequence of memory addresses for the memory.

The blocks of data of each row should be (and in an embodiment are)stored at different memory addresses in the sequence of memory addressesfor the memory. This will accordingly mean that each of the plural (e.g.at least the first and second) rows of blocks of data can be (and in anembodiment are) stored in the memory together, i.e. at the same time.

Each block of data should be (and in an embodiment is) stored at acontiguous set of plural memory addresses in the sequence of memoryaddresses. In an embodiment, where each block of data comprises pluraldata words (e.g. N data words), then each data word of each block isstored at a particular (single) memory address, and the data words ofeach block of data are in an embodiment stored at a group of contiguous(adjacent) memory addresses in the sequence of memory addresses. Otherarrangements would be possible.

In the technology described herein, at least some of the memoryaddresses of the second set of memory addresses fall between memoryaddresses of the first set of memory addresses in the sequence of memoryaddresses for the memory. In an embodiment, one or more or each of theblocks of data of the second row of blocks of data are stored at memoryaddresses that fall between memory addresses at which blocks of data ofthe first row are stored in the sequence of memory addresses for thememory.

Correspondingly, in an embodiment, at least some of the memory addressesof the first set of memory addresses fall between memory addresses ofthe second set of memory addresses in the sequence of memory addressesfor the memory. In an embodiment, one or more or each of the blocks ofdata of the first row of blocks of data are stored at memory addressesthat fall between memory addresses at which blocks of data of the secondrow are stored in the sequence of memory addresses for the memory.

In an embodiment, each block of data of each row is stored at a memoryaddress or memory addresses in the sequence of memory addresses that areseparated from the memory addresses at which other blocks in that roware stored. Thus, where each block of data is stored at a contiguous setof memory addresses in the sequence of memory addresses, then the firstand second sets of memory addresses will each comprise plural separatedcontiguous groups of memory addresses in the sequence of memoryaddresses. In other words, each block of data is in an embodiment storedat a contiguous set of (N) memory addresses in the sequence of memoryaddresses, and adjacent blocks of data of each row are in an embodimentstored at memory addresses that are separated (non-adjacent) in thesequence of memory addresses.

The number of memory addresses in the sequence of memory addresses thatseparates adjacent blocks of data of each row can be selected asdesired. Adjacent blocks of data of each row are in an embodimentseparated by the same number of memory addresses in the sequence ofmemory addresses.

In an embodiment, each block of data of each row is separated from theadjacent block or blocks of that row by a number of memory addresses inthe sequence of memory addresses that is equal to an integer multiple ofthe number of memory addresses required to store a block of data.

Thus, for example, where each block of data requires N memory addressesto be stored (e.g. in respect of N data words), then each of the datablocks of each row is in an embodiment separated from the adjacent blockor blocks of that row by an integer multiple of N memory addresses inthe sequence of memory addresses. It will be appreciated that thisarrangement means that an integer number of (other) blocks of data canbe (and in an embodiment are) stored at the memory addresses between thememory addresses at which adjacent data blocks of the same row arestored, thereby making efficient use of the available memory resources.

In an embodiment, the number of memory addresses in the sequence ofmemory addresses that separates adjacent blocks of data of each row isselected depending on the number of rows of data blocks (n) that are tobe stored in the memory together, e.g. and therefore depending on the(horizontal) size (resolution) of the output data array, i.e. dependingon the size of each row of data blocks.

In an embodiment, adjacent blocks of data of each row are stored atmemory addresses in the sequence of memory addresses that are separatedby a number of memory addresses in the sequence of memory addresses thatis sufficient to store one block of data from each of the other rowsthat are to be stored in the memory together with the data block row inquestion. In other words, the number of memory addresses in the sequenceof memory addresses that separates adjacent blocks of data of each rowis in an embodiment equal to the number of memory addresses required tostore n−1 blocks of data (where n is the number of rows of data blocksthat are to be stored in the memory together), e.g. N(n−1) memoryaddresses in the sequence of memory addresses.

Thus, for example, when two rows of data blocks are to be stored in thememory together (e.g. when the (horizontal) size (resolution) of theoutput data array is less than or equal to half of the maximum(horizontal) output data array size (resolution) that the dataprocessing system is configured to produce), then adjacent blocks ofdata of each row of data blocks (of each of first and second rows) arein an embodiment stored at memory addresses in the sequence of memoryaddresses that are separated by N memory addresses in the sequence ofmemory addresses.

Similarly, when four rows of data blocks are to be stored in the memorytogether (e.g. when the (horizontal) size (resolution) of the outputdata array is less than or equal to a quarter of the maximum(horizontal) output data array size (resolution) that the dataprocessing system is configured to produce), then adjacent blocks ofdata of each row of data blocks are in an embodiment stored at memoryaddresses in the sequence of memory addresses that are separated by 3Nmemory addresses, and so on.

In these embodiments, where adjacent data blocks of each row of datablocks are stored at memory addresses that are separated in the sequenceof memory addresses by a memory address “gap”, then the blocks of dataof each of the other rows of data that are to be stored in the memorytogether with the data block row in question are in an embodiment storedat the memory addresses in the “gaps”. In an embodiment, for each datablock row, one block of data of each of the other data block rows isstored between each data block of the row in question in the sequence ofmemory addresses.

Each set of plural rows of blocks of data (that are to be stored in thememory together) is in an embodiment stored in the memory such that therespective blocks of data of each row are interleaved in the sequence ofmemory addresses for the memory. That is, in the sequence of memoryaddresses for the memory, a first block of a first row is stored,followed by a first block of a second row, optionally followed by afirst block of a third row, etc., and then a second block of the firstrow is stored, followed by a second block of the second row, optionallyfollowed by a second block of the third row, etc.

Accordingly, in an embodiment, although as described above the blocks ofdata are in an embodiment produced in a block by block manner in rasterline order (i.e. where the blocks of one row are produced in order,followed by the blocks of the next row, etc.), the blocks of data ofeach set of plural rows (that are to be stored in the memory together)are in effect stored in the memory at memory addresses in the sequenceof memory addresses that follow a column by column order (i.e. where theblocks of one column of the set of plural rows are stored in order,followed by the blocks of the next column, etc.).

Thus, for example, where two (first and second) rows of blocks of dataare to be stored in the memory together, then the data blocks of thefirst row of blocks are in an embodiment stored in the memory at memoryaddresses in the sequence of memory addresses that are each separated byN memory addresses in the sequence of memory addresses. Each of the datablocks of the second row of blocks is in an embodiment then stored inthe memory at the memory address “gaps” in the sequence of memoryaddresses between each of the blocks of data of the first row.Accordingly, the first and second rows of data blocks are in anembodiment stored in the memory such that their respective blocks ofdata are stored in an interleaved manner in the sequence of memoryaddresses for the memory.

Similarly, where four rows of blocks of data are to be stored in thememory together, then the data blocks of the first row of blocks are inan embodiment stored in the memory at memory addresses in the sequenceof memory addresses that are separated by 3N memory addresses in thesequence of memory addresses. One data block from each of the second,third and fourth rows is in an embodiment then stored in the memory inthe memory address “gaps” in the sequence of memory addresses betweeneach of the blocks of data of the first row, in an embodiment in order.Accordingly, the first, second, third, and fourth rows of blocks of dataare in an embodiment stored such that their respective blocks of dataare stored in an interleaved manner in the sequence of memory addressesfor the memory.

In the technology described herein the data stored in the memory is readin the form of lines, e.g. of the output data array. This in anembodiment involves reading the data from the memory in a line-by-lineorder.

Thus, the data processing system is in an embodiment configured to readsome but not all of the data positions from multiple stored blocks ofdata consecutively (i.e. rather than reading a complete block beforestarting the next block), e.g. and in an embodiment, line by line, whereeach line comprises a concatenated respective row of data positions fromplural different blocks of data. Thus, each line of data positions thatis read by the data processing system is in an embodiment taken fromplural different blocks of data (tiles) stored in the memory.

In an embodiment, the data positions from each corresponding line ofeach of plural blocks of data of each row of blocks of data are readconsecutively, i.e. one after another before moving onto the next line.For example, the data positions in the top line of each of the blocks ina row of blocks can be read consecutively. The data positions in thesecond lines of each of the blocks in the row of blocks can then be readconsecutively (together), and so on.

However since, as described above, the form in which the data is read(i.e. lines) is different to the form in which it is produced (i.e.blocks) (and since the blocks of data of plural different rows of blocksof data are stored in a “mixed in” manner), the sequence of memoryaddresses for the memory that should be read in order to properly readthe data in the form of lines will not typically comprise a “linear” orfixed sequence of memory addresses, but will instead comprises a morecomplex sequence of memory addresses that, e.g., depends on therelationship between the blocks of data and the lines of data.

On the other hand, since the size of the blocks of data, the size of thelines of data, and the size of the memory are in an embodiment allfixed, then the read sequence of memory addresses for the memory will(and in an embodiment does) follow a particular, e.g. deterministic,pattern.

Thus, in an embodiment, reading the data in the form of lines comprisesreading a, in an embodiment predetermined, sequence of memory addresses(a read sequence of memory addresses) for the memory.

In this regard, the Applicants have recognised that in order to readeach line of data (where each line comprises a concatenated respectiveline of data positions from plural different blocks of data), theappropriate read sequence of memory addresses will (and in an embodimentdoes) comprise plural separated memory addresses or plural separatedsets of contiguous memory addresses, e.g. where each memory address oreach set of contiguous memory addresses corresponds to each line of datapositions from each of the different blocks of data.

Accordingly, reading the data in the form of lines should (and in anembodiment does) comprise reading a memory address or a contiguous setof memory addresses in order, “skipping” a particular number of memoryaddresses of the memory addresses for the memory, and then readinganother memory address or contiguous sets of memory addresses,“skipping” a particular number of memory addresses, and then readinganother memory address or contiguous sets of memory addresses, and soon.

Moreover, the Applicants have recognised that where the data is storedin the memory in the manner of the embodiments, each of the memoryaddresses or sets of contiguous memory addresses should be (and in anembodiment are) separated by a particular, in an embodiment selected,number of memory addresses (“memory address skip”). The size of thismemory address skip is in an embodiment the same in respect of each rowor set of plural rows of data blocks that is stored in the memory.

Thus, when a particular row or set of plural rows of data blocks hasbeen stored in the memory, the row or set of plural rows of data blocksis in an embodiment read in the form of lines by reading pluralseparated memory addresses or plural separated sets of contiguous memoryaddresses, where each memory address or set of contiguous memoryaddresses is separated by a particular, in an embodiment selected, in anembodiment constant, number of memory addresses.

The Applicants have furthermore recognised that the form of this readsequence of memory addresses is essentially the same for each successiverow or set of plural rows of data blocks that is stored in the memory,regardless of the fact that the memory addresses at which each new rowof data blocks is stored is in an embodiment “non-linear” as discussedabove. In particular, the only change that is required to the readsequence of memory addresses between successive data block rows orbetween successive sets of plural data block rows is in an embodiment inthe size of the memory address skip.

Moreover, the Applicants have recognised that the effect of “mixing in”the data blocks of plural data block rows in the manner of theembodiments, is again that the form of the read sequence of memoryaddresses is essentially the same, regardless of the number of rows ofdata blocks (n) that are stored in the memory. In particular, the onlychange that is required to the read sequence of memory addresses whenthe number of rows of data blocks (n) that are stored in the memory ischanged is in an embodiment again in the size of the memory addressskip. Furthermore, the necessary change to the size of the memoryaddress skip is in an embodiment an integer multiple, which in anembodiment depends on the number of rows of data blocks (n) that are tobe stored in the memory.

It will be appreciated that this means that the process of reading thedata in the form of lines is relatively simple and uniform, e.g.irrespective of the number of rows of data blocks that are to be storedin the memory. This in turn means that the processing circuitry providedfor this purpose can beneficially be less complex and will thereforerequire less chip area and consume less power.

Thus, in an embodiment, reading the data from the memory in the form oflines comprises reading data from plural sets of contiguous memoryaddresses, wherein each contiguous set of memory addresses is separatedby a particular, in an embodiment selected, number of memory addresses(“memory address skip”).

Each set of contiguous memory addresses in an embodiment comprises thesame number of memory addresses, i.e., that in an embodiment correspondto a single line of data positions of a single block of data.

The size of the memory address skip is in an embodiment the same inrespect of each row or set of plural rows of data blocks that is storedin and read from the memory. However, the size of the memory addressskip may change between successive rows or sets of plural rows of datablocks that are stored in and read from the memory.

Thus, successive rows or sets of plural rows of data blocks, e.g. of theoutput data array, are in an embodiment successively stored in and readfrom the memory in the manner described above (e.g. beginning with thetop data block row or set of adjacent data block rows of the output dataarray, followed by the next row or set of rows, etc.), where the size ofthe memory address skip used in the reading in an embodiment depends on(i.e. changes in dependence on) which of the data block rows or sets ofdata block rows of the data array is being read.

The size of the memory address skip in an embodiment also depends (i.e.changes in dependence on) on the number of data block rows (n) that arestored in the memory. Thus, the size of the memory address skip is in anembodiment a function of both the number of data block rows (n) that arestored in the memory, and which of the data block rows or sets of datablock rows of the data array is being read.

In these embodiments, the read sequence of memory addresses should be(and in an embodiment is) configured to “wrap around” the memoryaddresses of the memory. That is, where “skipping” a particular numberof memory addresses would result in a memory address that is outside ofthe range of memory addresses of the memory, the read sequence of memoryaddresses in an embodiment returns to the beginning of the memoryaddresses after passing the last memory address for the memory. A modulooperation is in an embodiment is used to achieve this. This ensures thatthe data is appropriately read out in the form of lines.

It is believed that the idea of reading data in the form of lines from amemory by reading contiguous sets of memory addresses that are separatedby a particular number of memory addresses, where the particular numberof memory addresses depends on the number of data block rows (n) thatare stored in the memory (e.g. and therefore on the (horizontal) size(resolution) of the output data array, i.e. on the size of each datablock row) is new and advantageous in its own right.

Thus, a third embodiment of the technology described herein comprises amethod of operating a data processing system comprising:

producing data in the form of blocks of data, where each block of datarepresents a particular region of an output data array;

storing the data in a memory of the data processing system; and

reading the data from the memory in the form of lines;

wherein reading the data from the memory in the form of lines comprisesreading data from plural contiguous sets of memory addresses for thememory; and

wherein the plural contiguous sets of memory addresses are eachseparated by a particular number of memory addresses that depends on thesize of the output data array.

A fourth embodiment of the technology described herein comprises a dataprocessing system comprising:

a first processing stage operable to produce data in the form of blocksof data, where each block of data represents a particular region of anoutput data array;

a second processing stage operable to read the data from the memory inthe form of lines; and

a memory;

wherein the data processing system is configured to read the data fromthe memory in the form of lines by reading data from plural contiguoussets of memory addresses for the memory;

and wherein the plural contiguous sets of memory addresses are eachseparated by a particular number of memory addresses that depends on thesize of the output data array.

As will be appreciated by those skilled in the art, these embodiments ofthe technology described herein can, and in an embodiment do, includeone or more or all of the optional features of the technology describedherein, as appropriate.

Thus, for example, storing the data in the memory in an embodimentcomprises storing each block of data of a first row of blocks of data inthe memory at one or more memory addresses of a first set of memoryaddresses of a sequence of memory addresses for the memory, and storingeach block of data of a second row of blocks of data in the memory atone or more memory addresses of a second set of different memoryaddresses of the sequence of memory addresses for the memory, where atleast some of the memory addresses of the second set of memory addressesin an embodiment fall between memory addresses of the first set ofmemory addresses in the sequence of memory addresses for the memory,e.g. and in an embodiment as described above.

Similarly, each set of contiguous memory addresses in an embodimentcomprises the same number of memory addresses, e.g. and in an embodimentas described above.

The size of the memory address skip is in an embodiment the same inrespect of each row or set of plural rows of data blocks that issuccessively stored in and read from the memory, but in an embodimentdepends on (is a function of) which data block row or set of plural datablock rows of the data array is being read, and in an embodiment also onthe number of data block rows (n) that are stored in the memory, e.g.and in an embodiment as described above.

The read sequence of memory addresses is in an embodiment configured to“wrap around” the memory addresses of the memory, e.g. and in anembodiment as described above.

The data processing system of the technology described herein maycomprise any suitable and desired data processing system. However, aswill be appreciated by those having skill in the art, the technologydescribed herein is particularly relevant to and useful in displaycontrollers or data processing systems comprising display controllers.

Thus, in an embodiment, the data processing system of the technologydescribed herein is or includes a display controller.

Correspondingly, another embodiment of the technology described hereincomprises a method of operating a display controller for data processingsystem, the method comprising:

producing data in the form of blocks of data, where each block of datarepresents a particular region of an output data array;

storing the data in a memory of the display controller; and

reading the data from the memory in the form of lines;

wherein storing the data in the memory comprises:

storing each block of data of a first row of blocks of data in thememory at one or more memory addresses of a first set of memoryaddresses of a sequence of memory addresses for the memory; and

storing each block of data of a second row of blocks of data in thememory at one or more memory addresses of a second set of differentmemory addresses of the sequence of memory addresses for the memory;

wherein at least some of the memory addresses of the second set ofmemory addresses fall between memory addresses of the first set ofmemory addresses in the sequence of memory addresses for the memory.

Another embodiment of the technology described herein comprises adisplay controller for a data processing system, the display controllercomprising:

a first processing stage operable to produce data in the form of blocksof data, where each block of data represents a particular region of anoutput data array;

a second processing stage operable to read the data from the memory inthe form of lines; and

a memory;

wherein the display controller is configured to store the data in thememory by:

storing each block of data of a first row of blocks of data in thememory one or more memory addresses of a first set of memory addressesof a sequence of memory addresses for the memory; and

storing each block of data of a second row of blocks of data in thememory at one or more memory addresses of a second set of differentmemory addresses of the sequence of memory addresses for the memory;

wherein at least some of the memory addresses of the second set ofmemory addresses fall between memory addresses of the first set ofmemory addresses in the sequence of memory addresses for the memory.

Another embodiment of the technology described herein comprises a methodof operating a display controller for data processing system, the methodcomprising:

producing data in the form of blocks of data, where each block of datarepresents a particular region of an output data array;

storing the data in a memory of the display controller; and

reading the data from the memory in the form of lines;

wherein reading the data from the memory in the form of lines comprisesreading data from plural contiguous sets of memory addresses for thememory; and

wherein the plural contiguous sets of memory addresses are eachseparated by a particular number of memory addresses that depends on thesize of the output data array.

Another embodiment of the technology described herein comprises adisplay controller for a data processing system, the display controllercomprising:

a first processing stage operable to produce data in the form of blocksof data, where each block of data represents a particular region of anoutput data array;

a second processing stage operable to read the data from the memory inthe form of lines; and

a memory;

wherein the display controller is configured to read the data from thememory in the form of lines by reading data from plural contiguous setsof memory addresses for the memory; and

wherein the plural contiguous sets of memory addresses are eachseparated by a particular number of memory addresses that depends on thesize of the output data array.

As will be appreciated by those skilled in the art, these embodiments ofthe technology described herein can, and in an embodiment do, includeone or more or all of the optional features of the technology describedherein, as appropriate.

The display controller may be any suitable and desired displaycontroller and may comprise any suitable and desired additionalprocessing stages that a display controller may include. The displaycontroller should be and is in an embodiment operable to provide animage for display to a display.

The (first processing stage of the) display controller in an embodimentcomprises a decoder and/or a rotation stage, in an embodiment togetherwith a read controller, e.g. and in an embodiment as described above.The (second processing stage of the) display controller in an embodimentcomprises a de-tiler processing stage, e.g. as discussed above, operableto convert data received in the form of one or more blocks (tiles) todata in the form of lines, e.g. for further processing and/or display,an (pixel) unpacking processing stage operable to extract data inrespect of each of plural individual data positions (pixels) from eachof plural data words, one or more layer or pixel processing pipelinesoperable to process the data in respect of each data position (pixel) asappropriate, e.g. for display, and/or an output stage operable toprovide an image for display to a display, e.g. and in an embodiment asdescribed above.

Although the technology described herein is described above withparticular reference to the processing of a given data array (e.g. aframe for display), as will be appreciated by those skilled in the art,the technology described herein can be, and is in an embodiment, usedfor processing plural data arrays (e.g. providing plural frames fordisplay), and in an embodiment for processing a sequence of data arrays(e.g. providing a sequence of frames to be displayed to a display).

The various stages of the data processing system may be implemented asdesired, e.g. in the form of one or more fixed-function units (hardware)(i.e. that is dedicated to one or more functions that cannot bechanged), or as one or more programmable processing stages, e.g. bymeans of programmable circuitry that can be programmed to perform thedesired operation. There may be both fixed function and programmablestages.

One or more of the various processing stages of the technology describedherein may be provided as a separate circuit element(s) to other stagesof the data processing system. However, one or more stages may also beat least partially formed of shared data processing circuitry.

One or more of the various stages of the technology described herein maybe operable to always carry out its function on any and all receiveddata. Additionally or alternatively, one of more of the stages may beoperable to selectively carry out its function on the received data,i.e. when desired and/or appropriate.

The data processing system may and in an embodiment does also compriseone or more of, and in an embodiment all of: a central processing unit,a graphics processing unit, a video processor (codec), a system bus, amemory controller, an image signal processor, a display processing unit,and additional elements as known to those skilled in the art.

The data processing system may be, and in an embodiment is, configuredto communicate with one or more of (and the technology described hereinalso extends to an arrangement comprising one or more of): an externalmemory (e.g. via the memory controller), one or more local displays,and/or one or more external displays.

In an embodiment, the data processing system further comprises a or thedisplay. The display that the display controller is used with may be anysuitable and desired display, such as for example, a screen (such as apanel) or a printer.

The technology described herein can be implemented in any suitablesystem, such as a suitably configured micro-processor based system. Inan embodiment, the technology described herein is implemented in acomputer and/or micro-processor based system.

The various functions of the technology described herein can be carriedout in any desired and suitable manner. For example, the functions ofthe technology described herein can be implemented in hardware orsoftware, as desired. Thus, for example, unless otherwise indicated, thevarious functional elements, stages, and “means” of the technologydescribed herein may comprise a suitable processor or processors,controller or controllers, functional units, circuitry, processinglogic, microprocessor arrangements, etc., that are operable to performthe various functions, etc., such as appropriately dedicated hardwareelements (processing circuitry) and/or programmable hardware elements(processing circuitry) that can be programmed to operate in the desiredmanner.

It should also be noted here that, as will be appreciated by thoseskilled in the art, the various functions, etc., of the technologydescribed herein may be duplicated and/or carried out in parallel on agiven processor. Equally, the various processing stages may shareprocessing circuitry, etc., if desired.

Furthermore, any one or more or all of the processing stages of thetechnology described herein may be embodied as processing stagecircuitry, e.g., in the form of one or more fixed-function units(hardware) (processing circuitry), and/or in the form of programmableprocessing circuitry that can be programmed to perform the desiredoperation. Equally, any one or more of the processing stages andprocessing stage circuitry of the technology described herein maycomprise a separate circuit element to any one or more of the otherprocessing stages or processing stage circuitry, and/or any one or moreor all of the processing stages and processing stage circuitry may be atleast partially formed of shared processing circuitry.

Subject to any hardware necessary to carry out the specific functionsdiscussed above, the display processing pipeline can otherwise includeany one or more or all of the usual functional units, etc., that displayprocessing pipelines include.

The display processor in an embodiment also comprises, and/or is incommunication with, one or more memories and/or memory devices thatstore the data described herein, and/or that store software forperforming the processes described herein. The display processor mayalso be in communication with the host microprocessor, and/or with adisplay for displaying images based on the data generated by the displayprocessor.

It will also be appreciated by those skilled in the art that all of thedescribed embodiments of the technology described herein can, and in anembodiment do, include, as appropriate, any one or more or all of thefeatures described herein.

The methods in accordance with the technology described herein may beimplemented at least partially using software e.g. computer programs. Itwill thus be seen that when viewed from further embodiments thetechnology described herein provides computer software specificallyadapted to carry out the methods herein described when installed on adata processor, a computer program element comprising computer softwarecode portions for performing the methods herein described when theprogram element is run on a data processor, and a computer programcomprising code adapted to perform all the steps of a method or of themethods herein described when the program is run on a data processingsystem. The data processor may be a microprocessor system, aprogrammable FPGA (field programmable gate array), etc.

The technology described herein also extends to a computer softwarecarrier comprising such software which when used to operate a graphicsprocessor, renderer or microprocessor system comprising a data processorcauses in conjunction with said data processor said processor, rendereror system to carry out the steps of the methods of the technologydescribed herein. Such a computer software carrier could be a physicalstorage medium such as a ROM chip, CD ROM, RAM, flash memory, or disk,or could be a signal such as an electronic signal over wires, an opticalsignal or a radio signal such as to a satellite or the like.

It will further be appreciated that not all steps of the methods of thetechnology described herein need be carried out by computer software andthus from a further broad embodiment the technology described hereinprovides computer software and such software installed on a computersoftware carrier for carrying out at least one of the steps of themethods set out herein.

The technology described herein may accordingly suitably be embodied asa computer program product for use with a computer system. Such animplementation may comprise a series of computer readable instructionseither fixed on a tangible, non-transitory medium, such as a computerreadable medium, for example, diskette, CD ROM, ROM, RAM, flash memory,or hard disk. It could also comprise a series of computer readableinstructions transmittable to a computer system, via a modem or otherinterface device, over either a tangible medium, including but notlimited to optical or analogue communications lines, or intangibly usingwireless techniques, including but not limited to microwave, infrared orother transmission techniques. The series of computer readableinstructions embodies all or part of the functionality previouslydescribed herein.

Those skilled in the art will appreciate that such computer readableinstructions can be written in a number of programming languages for usewith many computer architectures or operating systems. Further, suchinstructions may be stored using any memory technology, present orfuture, including but not limited to, semiconductor, magnetic, oroptical, or transmitted using any communications technology, present orfuture, including but not limited to optical, infrared, or microwave. Itis contemplated that such a computer program product may be distributedas a removable medium with accompanying printed or electronicdocumentation, for example, shrink wrapped software, pre-loaded with acomputer system, for example, on a system ROM or fixed disk, ordistributed from a server or electronic bulletin board over a network,for example, the Internet or World Wide Web.

An embodiment of the technology described herein will now be describedwith reference to the Figures.

FIG. 1 shows schematically a data processing system in accordance withan embodiment of the present embodiment. The data processing systemcomprises a video codec 1, central processing unit (CPU) 2, graphicsprocessing unit (GPU) 3, display controller 4 and a memory controller 7.As shown in FIG. 1, these communicate via an interconnect 6 and haveaccess to off-chip main memory 8. The video codec 1, CPU 2, and/or theGPU 3 generate output surfaces and store them, via the memory controller7, in a frame buffer in the off-chip memory 8. The display controller 4then reads output surfaces from the frame buffer in the off-chip memory8 via the memory controller 8 and sends them to a display 5 for display.

FIG. 2 shows schematically a display controller 4 in accordance with anembodiment of the technology described herein. In FIG. 2, the rectanglesrepresent functional units of the display controller, while the arrowedlines represent connections between the various functional units.

FIG. 2 shows the main elements of the display controller 4 that arerelevant to the operation of the present embodiment. As will beappreciated by those skilled in the art there will be other elements ofthe display controller 4 that are not illustrated in FIG. 2. It shouldalso be noted here that FIG. 2 is only schematic, and that, for example,in practice the shown functional units and stages may share significanthardware circuits, even though they are shown schematically as separatestages in FIG. 2. It will also be appreciated that each of the stages,elements and units, etc., of the display controller 4 as shown in FIG. 2may be implemented as desired and will accordingly comprise, e.g.,appropriate circuitry and/or processing logic, etc., for performing thenecessary operation and functions.

In the present embodiment, the display controller 4 comprises a readcontroller in the form of a Direct Memory Access (DMA) read controller10. The read controller 10 is configured to read one or more surfacesfrom main memory 8 (not shown in FIG. 2) via an interface such as anAdvance eXtensible Interface (AXI). The one or more surfaces willtypically be in the form of (optionally compressed) RGB data.

Co-located with the read controller 10 is a decoder 11 which can be usedto (selectively) decode (decompress) received compressed surfaces asnecessary, before onward transmission of the one or more decoded(decompressed) surfaces. The decoder 11 may comprise an ARM Frame BufferCompression (AFBC) decoder (AFBC is described in US A1 2013/0034309). Itwould, of course, be possible to use other compression schemes. The useof compression reduces the bandwidth associated with the displaycontroller 4 reading surfaces from the off-chip memory 8.

Similarly, rotation unit 12 can be used to selectively rotate one ormore of the input surfaces as necessary before onward transmission ofthe one or more input surfaces.

In the illustrated embodiment, the read controller 4 is configured to(read) up to three different input surfaces (layers) which are to beused to generate a composited output frame. In this embodiment, thethree input layers comprise one video layer, e.g. generated by a videoprocessor (codec), and two graphics layers, e.g. two graphics windowsgenerated by a graphics processing unit (GPU). Hence, FIG. 2 shows thedisplayer controller onwardly transmitting three input surfaces (displaylayers) via three layer pipelines or channels, namely video channel 13a, a first graphics channel 13 b, and a second graphics channel 13 c.Any or all of the transmitted input surfaces may have been subjected todecoding (decompression) by decoder 11 and/or rotation by rotation unit12, as discussed above.

Although the embodiment of FIG. 2 illustrates the use of three inputsurfaces, it will be appreciated that any number of input surfaces(layers), and any combination of one or more types of input surface(e.g. video and/or graphics layers, etc.), may be used in the technologydescribed herein, depending on the application in question (and alsodepending on any silicon area constraints, etc.). Equally, any number oflayer pipelines or channels may be provided and used, as desired.

The display controller 4 of the present embodiment optionally comprisesa multiplexer/data-flow control 14. Where present, the displaycontroller may be configured such that multiplexer 14 receives inputsfrom any one or more (or all) of the input surface channels. Themultiplexer 14 may operate to selectively transmit any one or more (orall) of the received inputs (i.e. surfaces) to any one or more of themultiplexer's 14 outputs.

The display controller 4 of the present embodiment optionally comprisesa composition unit 15. Where present, the display controller 4 may beconfigured such that the composition unit 15 receives inputs directlyfrom any one or more or all of the channels 13, and/or from themultiplexer 14. The composition unit 15 may operate to compose thereceived input surfaces to generate a composited output frame, i.e. byappropriate blending operations, etc. In the illustrated embodiment, thecomposited output frame may be onwardly transmitted by the compositionunit 15 to multiplexer 14, and/or to post-processing pipeline 16.

The post-processing pipeline 16 is configured to selectively carry outany desired processing operation(s) on the (composited) output surface(frame). The post-processing pipeline 16 may, for example, comprise acolour conversion stage operable to apply a colour conversion to the(composited) output frame, a dithering stage operable to apply ditheringto the (composited) output frame, and/or a gamma correction stageoperable to carry out gamma correction on the (composited) output frame.

In the present embodiment, the post-processing pipeline 16 is configuredto transmit the (processed) composited output frame to an output stagecomprising a display timing unit 17 for appropriate display on a (local)display (not shown). The display timing unit 17 is configured to sendpixel data to the display with appropriate horizontal and verticalblanking periods.

The display controller 4 of the present embodiment optionally comprisesa scaling engine 18. Where present, the scaling engine 18 operates to(selectively) scale (i.e. upscale or downscale) any one or more receivedsurfaces (frames) to generate a scaled surface (frame).

In the present embodiment, the display controller optionally comprises awrite controller 19, e.g. in the form of a DMA write controller. Wherepresent, the write controller 19 may be configured to write out receivedsurfaces (frames) to external memory 8 (e.g. frame buffer), e.g. viaAXI.

Thus, this embodiment of the technology described herein comprises adisplay controller that integrates a decoder 11, and a rotation unit 12,optionally together with a composition unit 15, and/or a scaling engine18 capable of up and down-scaling surfaces. The decoder 11 and therotation unit 12 are embedded within the display controller, such thatsurfaces read by the display controller 4 may be decoded (decompressed)and/or rotated (and then optionally further processed, e.g. compositedand/or scaled) before being displayed, with only a single read (of eachinput surface) from the frame buffer being required.

FIG. 3 shows in more detail a portion of the display controller 4 thatis particularly relevant to the operation of an embodiment. In thisembodiment, one or more surfaces stored in the frame buffer in theoff-chip memory 8 are compressed using AFBC or another block-basedencoding scheme.

In AFBC and other block-based encoding schemes, each compressed surfaceis encoded as plural blocks (tiles) of data, where each block (tile) ofdata represents a particular region of the surface. Accordingly, thedisplay controller 4 fetches each surface from the frame buffer in theoff-chip memory 8 in blocks (tiles), i.e. block by block (tile by tile).In contrast, the display controller 4 provides output images for displayto the display 5 in raster lines, i.e. rows of pixel positions that areone pixel position high and many pixels positions wide (long).Accordingly, the display controller 4 converts read block (tile) datainto raster line data, and then sends the raster line data to thedisplay 5 for display.

In the present embodiment, the read controller 10 is configured to readone or more blocks (tiles) of the compressed surface from main memory 8via an interface such as an Advance eXtensible Interface (AXI). To dothis, the read controller 10 sends requests to the memory 8 via an (AXI)address channel, and receives data from the memory 8 via a (AXI) datachannel. The read controller 10 may comprise a re-ordering buffer toallow blocks (tiles) that are received by the read controller 10 out oforder to be re-ordered, if desired.

The one or more compressed blocks (tiles) will be in the form of AFBCencoded RGB data.

Compressed blocks (tiles) read by the read controller 10 are thenonwardly transmitted to the decoder 11 via an interface such as an AXIinterface or a valid/ready interface. The decoder 11 operates to decodethe compressed blocks (tiles) to produce uncompressed tiles, e.g. of RGBdata. The uncompressed blocks are then onwardly transmitted to ade-tiler 20 in the form of data words such as AXI words. Each word willinclude data in respect of multiple pixels (data positions) of the block(tile).

The de-tiler 20 operates to convert the block (tile) data to raster linedata. As shown in FIG. 3, in order to do this, the de-tiler 20 comprisesa buffer memory 21. Block (tile) data received by the de-tiler 20 iswritten to the buffer 21 block by block (tile by tile), and then rasterline data in the form of data words such as AXI width words is read outfrom the buffer 21 by reading the appropriate words for each raster linefrom the buffer 21.

The line data in the form of (AXI) words is then fed to a pixel unpackeror layer controller via a latency-hiding first in first out (FIFO)buffer 22. The pixel unpacker extracts data for individual pixels (datapositions) from the received (AXI) words, and onwardly transmits thedata to the appropriate channel or layer pipeline 13 for furtherprocessing (e.g. as discussed above).

FIG. 4 illustrates schematically the operation of the display controller4 when fetching compressed block (tile) data and converting it into linedata. As shown in FIG. 4, compressed blocks (e.g. of an AFBC compressedlayer) are fetched from external memory 8 via an (AXI) interface (step30). The blocks are decompressed and written to de-tiler 20 (step 31).Data words (AXI words) are then read from the de-tiler 20 and sent tothe pixel unpacker 23 (step 32). Data in respect of individual pixels isextracted from the (AXI) words and sent to layer or pixel processingpipeline 13, e.g. at a rate of 1 pixel per clock cycle (step 33).

In de-tiler arrangements, as shown in FIG. 3, a sequence of (X) blocks(tiles) corresponding to an entire line (frame) width of blocks (tiles),i.e. a row of blocks, should be written to the de-tiler buffer 21 beforethe block data can be read out in lines. However, this can result in abottleneck in the system, where it is necessary to wait for an entirerow of blocks to be written to the de-tiler buffer 21 before the blockdata can be read out in lines, and to then wait for the entire row ofblocks to be read out in lines before a further row of blocks can bewritten to the de-tiler buffer 21, etc.

It would be possible to address this using a so-called “ping pong”buffer arrangement, wherein, while a first row of block data is beingread out from a first buffer in lines, a second row of block data iswritten to a second buffer. However, this is relatively expensive interms of buffers resources, since it would be necessary to configure thebuffer to be able to store at least two rows of blocks of data for themaximum resolution data array that the display controller 4 isconfigured to support. In other words, the number of line buffersrequired would be twice the height of each tile.

The Applicants have recognised that it is, in fact, possible to addressthe above-described problem while configuring the buffer 21 to be ableto store only a single row of blocks of data for the maximum resolutiondata array that the display controller 4 is configured to support, andthat this is more efficient, e.g. in terms of use of resources and powerconsumption of the data processing system.

In particular, when lines of data words are read from the de-tilingbuffer 21, the next arriving blocks of data are able to be written intothe de-tiling buffer by taking advantage of sufficient space having beenfreed by the reading. In particular, when a number of data words (N)corresponding to a block of data has been read from the de-tiling buffer21 (despite the fact that the reading is not itself performed in termsof blocks, but rather in terms of lines (pixel rows)), the next arrivingblock of data can be directly written to the space which has beenreleased by the reading procedure.

Furthermore, when reading out a line (i.e. a frame width pixel row), asequence of read buffer addresses which comprises “jumps” or “skipsteps” can be used to account for the fact that the buffer addresses arewritten to in blocks, but read from in lines. The size of these skipsteps is commensurately increased for each block row of a data arraywhich is passed through the de-tiling buffer, but eventually theincrease in size of these skip steps will “wrap around” and the patternof reads and writes repeats.

Where there are plural data words in each pixel line of each block, thenwhen reading data words for a pixel row from the de-tiling buffer 21,sequential buffer addresses corresponding to the number of data words ineach pixel line of each block are read, interspersed by the skip stepswhich increase in size until they wrap around, i.e. the skip stepreturns to zero.

It will be appreciated that this arrangement provides a particularlysimple and efficient reading procedure for reading the data in the formof lines.

FIG. 5 illustrates this addressing scheme. FIG. 5 shows an arrangementcomprising a de-tiler buffer 21 that has a total of 16 memory locationswith addresses 0-15, where each tile of the output data array (frame)occupies four memory addresses. Accordingly, the buffer 21 canaccommodate a row of four tiles.

Although each line (row) of data positions of each tile is shown asbeing accommodated by a single memory address in FIG. 5, in practiceeach line (row) of data positions of each tile may instead beaccommodated by a set of contiguous memory addresses. Equally, althoughFIG. 5 shows an arrangement in which the de-tiler buffer 21 canaccommodate a row of four tiles, in practice the de-tiler buffer may beconfigured to accommodate any number of rows of tiles, e.g. many morethan four, e.g. depending on the maximum resolution data array that thedisplay controller 4 is configured to support.

As shown in FIG. 5, a row of tiles (row N) of an output data array(frame) comprising four tiles (tile 0 to tile 3) is initially written tothe buffer 21 using a linear sequence of memory addresses 0-15 (step61). In order to read out the top line of the row of tiles, memoryaddresses 0, 4, 8 and 12 are then read in order, and in order to readeach of the following lines of tiles, memory addresses 1, 5, 9, 13; 2,6, 10, 14; and then 3, 7, 11, 15 are read in order (step 62).

Since in this illustrative example, each line occupies the same numberof memory addresses as each tile, after memory addresses 0, 4, 8 and 12have been read, a further tile, e.g. of the next row (row N+1) can bewritten to these “freed up” memory addresses. Similarly, after readingthe memory addresses in respect of each line, a further tile of the nextrow (row N+1) can be written to the “freed up” memory addresses.

Accordingly, the tiles of the next row of tiles (row N+1) are written tothe memory at memory addresses that follow the read pattern. Hence, step62 illustrates both the read and the subsequent write operations.

As shown in FIG. 5, in order to read this next row of tiles (row N+1) inthe form of lines, a different sequence of memory addresses is used,namely the linear sequence 0-15 (step 63). As also illustrated by step63, the tiles of the next row of tiles (row N+2) are written to thememory at memory addresses that follow this read pattern.

As such, the pattern of writes and reads effectively repeats itself.Accordingly, step 64 illustrates the sequence of buffer addresses forreading row N+2 and the sequence of buffer addresses for writing of thenext row of tiles (row N+3) to the buffer 21, which corresponds to step62.

In general, after a first tile row has been written to sequential bufferaddresses, the skip steps required to read out the lines of pixel data(and indeed to then write further block data into the buffer addressesfreed up by that read) is represented by:

B·(N/B)^(R)−(B−1),

where R is the tile row number, B is the number of data words in eachpixel line of a tile, and N is the number of data words used for eachtile.

Hence for example with reference to FIG. 5, where N=4 and B=1, the skipsteps are given by 1×(4/1)^(R)−(1−1), i.e. 1.(4)^(R)−0, being 4 for thefirst read/second write (step 62), 16 for the second read/third write(step 63), and 64 for the third read/fourth write (step 64). This stepsize is configured to “wrap-around”, i.e. in order to be appropriatelyconstrained within the memory locations of the memory.

The memory addresses are subject to modulo M, where M is the number ofdata words in each frame-width tile row, i.e. the memory address returnsto the first memory address (0) after the last memory address (15).

The Applicants have furthermore recognised that in these arrangements,where the horizontal size (resolution) of the output data array (frame)is less than the maximum supported resolution, then the de-tiler buffer21 will effectively be underutilised. Accordingly, where possible, itwould be desirable to store more than one row of tiles in the buffer 21.

Storing more than one row of tiles in the buffer 21 has the effect ofincreasing the latency tolerance of the system. This is because doing soallows the de-tiler 20 to begin reading the data in the form of lineswhen the buffer 21 is only partially filled with data, and at the sametime as (further) data is being produced and stored in the buffer 21.

In the present embodiment, the de-tiler 20 is operable to store pluralrows of tiles in the buffer 21 together when it is possible to do so,e.g. when the horizontal size of the output data array is a fraction ofthe maximum supported resolution.

For example, where the maximum supported horizontal resolution is 4K,for layers having a size of between 4K and 2K, the de-tiler 20 canbuffer one row of tiles (using the above described addressing scheme).For layers having a size of between 2K and 1K, the de-tiler 20 canbuffer two rows of tiles. For layers having a size of between 1K and0.5K, the de-tiler 20 can buffer four rows of tiles. For layers having asize of between 0.5K and 0.25K, the de-tiler 2 can buffer eight rows oftiles, etc.

One conceptually simple way to implement this would be to store each rowof the plural rows of tiles separately, e.g. each occupying its owncontiguous set of memory addresses in the buffer 21. However, this wouldrequire, inter alia, the provision of plural pointers in order to keeptrack of each row of tiles in the memory separately, and would thereforebe relatively complex.

The Applicants have recognised that an improved way to store plural rowsof tiles in the de-tiler buffer 21 is to store the tiles of each of theplural rows in an interspersed, e.g. interleaved, manner.

For example, where two rows of tiles are to be stored in the memorytogether, the tiles of each row are stored in the memory in aninterleaved manner. Where four rows of tiles are to be stored in thememory together, the tiles of each row are stored in the memory in aninterleaved manner, e.g. such that the first tile of the first row isstored, followed by the first tile of the second row, the first tile ofthe third row, the first tile of the fourth row, and then the secondtile of the first row, and so on.

In order to facilitate this, the tiles of each row are stored in thememory using separated memory addresses.

Thus, for example, where two rows of tiles are to be stored in thememory, after writing one tile of the first row to the de-tiler buffer21, the address space that would be required for the next tile is notwritten to, but instead the next in-coming tile is written to theaddress space that would otherwise be used for the next-but-one tile.After writing the first row of tiles in this manner, the second row oftiles is written to the empty spaces that were left unwritten whenwriting the first row of tiles.

The address scheme of the present embodiment is illustrated in moredetail by FIG. 6. FIG. 6 shows an arrangement comprising a de-tilerbuffer 21 that has a total of 32 memory locations with memory address0-31, where each tile of the output data array (frame) occupies fourmemory addresses. Accordingly, the buffer 21 can accommodate two rows oftiles, where each row comprises four tiles.

Although each line (row) of data positions of each tile is shown asbeing accommodated by a single memory address in FIG. 6, in practiceeach line (row) of data positions of each tile may instead beaccommodated by a set of contiguous memory addresses. Equally, althoughFIG. 6 shows an arrangement in which the de-tiler buffer 21 canaccommodate two rows of four tiles, in practice the de-tiler buffer maybe configured to accommodate any number of rows of tiles, e.g. many morethan this, e.g. depending on the maximum resolution data array that thedisplay controller 4 is configured to support.

As shown in FIG. 6, a first row of tiles (row N) comprising four tilesis initially written to the buffer 21 (step 71). The first tile (tile 0)is written to the first available memory addresses, namely memoryaddressees 0-3. However, since in this example two rows of tiles are tobe written to the buffer 21, the next tile in the row (row N) (tile 1)is written to memory addresses 8-11. That is, a gap of four memoryaddresses (namely memory addresses 4-7) is left between tile 0 and tile1 of row N. Similarly, the third tile (tile 2) is written to memoryaddresses 16-19 and the fourth tile (tile-3) is written to memoryaddresses 24-27.

Next (in step 71), the tiles of the second row of tiles (row N+1) arewritten to the buffer 21 at memory addresses between the memoryaddresses that tiles of the first row were written to. Accordingly, tile0 of row N+1 is written to memory addresses 4-7, tile 1 of row N+1 iswritten to memory addresses 12-15, tile 2 of row N+1 is written tomemory addresses 20-23, and tile 3 of row N+1 is written to memoryaddresses 28-31.

Now, in order to read the first row (row N) in the form of lines, it canbe seen from FIG. 6, that memory addresses 0, 8, 16, 24; 1, 9, 17, 25;2, 10, 18, 26; and then 3, 11, 19, 27 should be read. Similarly, inorder to read the second row (row N+1) in the form of lines, it can beseen from FIG. 6, that memory addresses 4, 12, 20, 28; 5, 13, 21, 29; 6,14, 22, 30; and then 7, 15, 23, 31 should be read (step 72).

It will be appreciated that in this example, each memory address in thisread sequence of memory addresses is separated by 8 memory addresses(where the memory addresses “wrap around” to memory address 0 frommemory address 31).

In step 72, the third (row N+2) and fourth (row N+3) rows are written tothe memory using this read sequence of memory addresses. However, in acorresponding manner to that described above, adjacent tiles of thethird row (row N+2) are stored using memory addresses that are separatedby a memory address gap in the sequence of memory addresses. The memoryaddress gaps are then filled with tiles from the fourth row (row N+3) oftiles.

Accordingly, the first tile of the third row (row N+2) is written to thefirst available memory addresses, namely memory addresses 0, 8, 16, 24,and then the following tiles are written to memory addresses 2, 10, 18,26 (i.e. leaving out memory addresses 1, 9, 17, 25 in the sequence ofmemory addresses); 4, 12, 10, 28; and then 6, 14, 22, 30. The tiles ofthe fourth row (row N+3) are then written to memory addresses 1, 9, 17,25; 3, 11, 19, 27; 5, 13, 21, 29; and then 7, 15, 23, 31.

It can again be seen from the “2^(nd) read” step in step 73 of FIG. 6that the consequence of storing the two rows of tiles in the buffer 21in this “mixed in” manner is that the memory addresses that should beread in order to read out the third and fourth rows in the form of linesare separated by a constant memory address skip step, namely two memoryaddresses in this example.

As shown in FIG. 6, this pattern of writing and reading is repeated forthe following rows of data of the data array (frame) (steps 73, 74).

In general, the read sequence of memory addresses that is required toread plural rows of tiles stored together in the buffer 21 in theinterleaved manner of the present embodiment is similar to the readsequence of memory addresses that is required to read a single row oftiles stored in the buffer 21 in the manner described above in relationto FIG. 5, except that the memory address skip step is different.

In particular, where two rows of tiles are stored in the buffer 21, whenreading the data from the buffer 21 in raster scan order, read addressesare generated using essentially the same scheme, but assuming that thetile height is double the actual value.

More generally, where n rows of tiles are stored in the buffer 21, whenreading the data from the buffer 21 in raster scan order, read addressesare generated using essentially the same scheme as described above inrelation to FIG. 5, but assuming that the tile height is n times theactual value. This facilitates a particularly simple and seamlessde-tiling operation, regardless of the number of rows of tiles that arestored in the de-tiler buffer 21.

As described above, the buffering of multiple rows of tiles wherepossible in the manner of the present embodiment also allowssignificantly improved latency tolerance while consuming the same amountof power as would be used if only one row was buffered at a time, andalso allows improved clustering/bundling of memory read transactions,which beneficially results in longer periods of silence on theinterconnect or bus 6.

FIG. 7 illustrates a de-tiling process according to the presentembodiment.

The process begins when the de-tiling operation begins (step 80). Thefirst tile of the current row of tiles is written to the first memoryaddress (memory address 0) of the buffer 21. This is done by thede-tiler 20 setting the address to zero (step 81), requesting the tiledata from the source (e.g. the decoder 11) (step 82), and storing thetile data in the de-tiler buffer 21 (step 83).

When this first tile has been stored in the buffer, a record (named“Temp”) is made of the next available memory address (steps 86, 87) ofthe buffer 21.

The writing process then jumps a number of memory addresses equal to thenumber of memory addresses required to store one tile (step 85), andstores the next tile in the current row of tiles in the buffer 21 at thenext set of memory addresses. Again, this is done by the de-tiler 20requesting the tile data from the source (e.g. the decoder 11) (step82), and storing the tile data in the de-tiler buffer 21 (step 83).

When the last tile in the current row of tiles has been stored in thebuffer in this manner (when step 84 is satisfied), the process thenmoves on to store the next row of tiles in the buffer 21.

The first tile of this next row of tiles is written to the firstavailable memory address of the buffer 21. This is done by the de-tiler20 setting the address to the recorded memory address (i.e. “Temp”)(step 88), requesting the tile data from the source (e.g. the decoder11) (step 89), and storing the tile data in the de-tiler buffer 21 (step90).

The writing process then jumps a number of memory addresses equal to thenumber of memory addresses required to store one tile (step 92), andstores the next tile in the current row of tiles in the buffer 21 at thenext set of memory addresses. Again, this is done by the de-tiler 20requesting the tile data from the source (e.g. the decoder 11) (step89), and storing the tile data in the de-tiler buffer 21 (step 90).

When the last tile in this next row of tiles has been stored in thebuffer in this manner (when step 91 is satisfied), the process is theniteratively repeated in respect of the remaining rows of tiles of thecurrent layer, and ends at step 94 when it is determined (at step 93)that the last row of tiles has been reached.

When two rows of tiles are to be stored in the buffer 21, the completeaddressing scheme, where NumLB is the tile height and ALW is the layerwidth, is defined as follows:

Adr_(1st) _(—) _(NZ) = 1 for (j = 0; j < (LayerHeight −(NumLB×2)); j =(j + NumLB×2)) {    Adr₀=First NonZero address in last write×(NumLB×2)   Adr₁={Adr₀<(ALW×2)→Adr₀       Adr₀≥(ALW×2)→mod(Adr₀,ALW×2)    for (i= 0; i < ALW ×2; i++)    {       Var₀=(i,NumLB)       Offset₀= { Var₀!=0 →0             Var0==0→NumLB       Offset₁=Offset₀×Adr1      if(mod(i, ALW) ==0)       {          Adr₂=(i==0) ? 0: Offset₁      }       else       {          Adr₂=Adr_(previous)+Adr₁+Offset₁      }       RdAdr= (Adr₂,(ALW×2)−1)       Adr_(Previous)=RdAdr      if(i == 1)       {          Adr_(1st) _(—) _(Nz)=RdAdr       }   } }

It will be appreciated that this scheme can be modified in a relativelystraightforward manner in order to store any number of rows of tiles inthe buffer 21.

For example, for a maximum supported horizontal resolution of 4096pixels, for layers having the various different sizes shown in Table 1,the number of rows of tiles that are shown in the table can be storedtogether in the buffer 21.

FIG. 8 shows schematically a display controller 4 in accordance withanother embodiment. FIG. 8 is similar to FIG. 3, and operates in asubstantially similar manner. However, in place of the decoder 11 ofFIG. 3 is a rotation stage 12, which e.g. operates to rotate receivedblocks of data.

Thus, in the embodiment of FIG. 8, the read controller 10 operates toread one or more (uncompressed) surfaces from memory 8, and to pass thatdata to the rotation stage 12. As shown in FIG. 8, these columns are fedto the rotation block 12, which rotates the data as appropriate, andwrites the data to the de-tiler buffer 21 in (AXI) words. The de-tiler20 reads the data out from the buffer 21 in raster scan order as (AXI)words, and feeds it to the layer controller 23.

FIG. 9 shows schematically the operation of the display controller 4when fetching uncompressed data, rotating the data and converting itinto raster line data in accordance with the present embodiment.

As shown in FIG. 9, blocks (tiles) created from bursts of (AXI) wordsare read from a column of an uncompressed surface in external memory 8(step 100). The pixel data within the block (tile) is rotated as desiredby the rotation stage 12 (step 101). Rotated blocks are then written tothe de-tiler buffer 21 (step 102). Data words (AXI words) are read fromthe de-tiler buffer 21 and sent to the pixel unpacker 23 (step 103).Pixels are extracted from the (AXI) words and sent to the pixelprocessing pipeline 16 at a rate of one pixel per clock cycle (step104).

It can be seen from the above that embodiments of the technologydescribed herein enable reduction of power consumption within a dataprocessing system, e.g. where block (tile) data is converted to linedata by writing the block data to a memory and then reading the data inthe form of lines from the memory. This is achieved, in embodiments atleast, by storing blocks of data of a second row of blocks of data inthe memory at memory addresses that fall between memory addresses atwhich blocks of data of a first row of blocks of data are stored in asequence of memory addresses for the memory.

The foregoing detailed description has been presented for the purposesof illustration and description. It is not intended to be exhaustive orto limit the technology to the precise form disclosed. Manymodifications and variations are possible in the light of the aboveteaching. The described embodiments were chosen in order to best explainthe principles of the technology and its practical application, tothereby enable others skilled in the art to best utilise the technologyin various embodiments and with various modifications as are suited tothe particular use contemplated. It is intended that the scope bedefined by the claims appended hereto.

What is claimed is:
 1. A method of operating a data processing systemcomprising: producing data in the form of blocks of data, where eachblock of data represents a particular region of an output data array;storing the data in a memory of the data processing system; and readingthe data from the memory in the form of lines; wherein storing the datain the memory comprises: storing each block of data of a first row ofblocks of data in the memory at one or more memory addresses of a firstset of memory addresses of a sequence of memory addresses for thememory; and storing each block of data of a second row of blocks of datain the memory at one or more memory addresses of a second set ofdifferent memory addresses of the sequence of memory addresses for thememory; wherein at least some of the memory addresses of the second setof memory addresses fall between memory addresses of the first set ofmemory addresses in the sequence of memory addresses for the memory. 2.The method of claim 1, wherein the memory has a size that is sufficientto store only one row of blocks of data of a maximum output data arraysize that the data processing system is configured to produce.
 3. Themethod of claim 1, wherein the method comprises: when the size of eachrow of blocks of data is less than or equal to half the size of thememory: storing each block of data of the first row of blocks of data inthe memory at one or more memory addresses of the first set of memoryaddresses of the sequence of memory addresses for the memory, andstoring each block of data of the second row of blocks of data in thememory at one or more memory addresses of the second set of differentmemory addresses of the sequence of memory addresses for the memory; butwhen the size of each row of blocks of data is greater than half of thesize of the memory: storing each block of data of a single row of blocksof data in the memory.
 4. The method of claim 1, further comprisingselecting a number of rows of blocks of data to store together in thememory depending on the size of output data array.
 5. The method ofclaim 1, wherein the sequence of memory addresses corresponds at leastin part to a read sequence of memory addresses for reading data from thememory in the form of lines.
 6. The method of claim 1, wherein for eachrow of blocks of data, adjacent blocks of data of the row are stored atmemory addresses that are separated in the sequence of memory addresses.7. The method of claim 1, wherein reading the data from the memory inthe form of lines comprises reading data from plural sets of contiguousmemory addresses, wherein each contiguous set of memory addresses isseparated by a particular number of memory addresses.
 8. The method ofclaim 7, wherein the particular number of memory addresses depends onthe number of rows of blocks of data that are stored in the memory.
 9. Amethod of operating a data processing system comprising: producing datain the form of blocks of data, where each block of data represents aparticular region of an output data array; storing the data in a memoryof the data processing system; and reading the data from the memory inthe form of lines; wherein reading the data from the memory in the formof lines comprises reading data from plural contiguous sets of memoryaddresses for the memory; and wherein the plural contiguous sets ofmemory addresses are each separated by a particular number of memoryaddresses that depends on the size of the output data array.
 10. Themethod of claim 1, further comprising causing at least some of the dataand/or a processed version of at least some of the data to be displayed.11. A data processing system comprising: first processing stagecircuitry operable to produce data in the form of plural blocks of data,where each block of data represents a particular region of an outputdata array; second processing stage circuitry operable to read the datafrom the memory in the form of lines; and a memory; wherein the dataprocessing system is operable to store the data in the memory by:storing each block of data of a first row of blocks of data in thememory one or more memory addresses of a first set of memory addressesof a sequence of memory addresses for the memory; and storing each blockof data of a second row of blocks of data in the memory at one or morememory addresses of a second set of different memory addresses of thesequence of memory addresses for the memory; wherein at least some ofthe memory addresses of the second set of memory addresses fall betweenmemory addresses of the first set of memory addresses in the sequence ofmemory addresses for the memory.
 12. The data processing system of claim11, wherein the memory has a size that is sufficient to store only onerow of blocks of data of a maximum output data array size that the dataprocessing system is configured to produce.
 13. The data processingsystem of claim 11, wherein the data processing system is operable tostore the data in the memory by: when the size of each row of datablocks is less than or equal to half the size of the memory: storingeach block of data of the first row of blocks of data in the memory atone or more memory addresses of the first set of memory addresses of thesequence of memory addresses for the memory, and storing each block ofdata of the second row of blocks of data in the memory at one or morememory addresses of the second set of different memory addresses of thesequence of memory addresses for the memory; but when the size of eachrow of data blocks is greater than half of the size of the memory:storing each block of data of a single row of blocks of data in thememory.
 14. The data processing system of claim 11, wherein the dataprocessing system is operable to select a number of rows of blocks ofdata to store together in the memory depending on the size of the outputdata array.
 15. The data processing system of claim 11, wherein thesequence of memory addresses corresponds at least in part to a readsequence of memory addresses for reading data from the memory in theform of lines.
 16. The data processing system of claim 11, wherein thedata processing system is operable to store adjacent blocks of data ofeach row at memory addresses that are separated in the sequence ofmemory addresses.
 17. The data processing system of claim 11, whereinthe data processing system is operable to read the data from the memoryin the form of lines by reading data from plural sets of contiguousmemory addresses, wherein each contiguous set of memory addresses isseparated by a particular number of memory addresses.
 18. The dataprocessing system of claim 17, wherein the particular number of memoryaddresses depends on the number of rows of blocks of data that arestored in the memory.
 19. A data processing system comprising: firstprocessing stage circuitry operable to produce data in the form ofblocks of data, where each block of data represents a particular regionof an output data array; second processing stage circuitry operable toread the data from the memory in the form of lines; and a memory;wherein the data processing system is configured to read the data fromthe memory in the form of lines by reading data from plural contiguoussets of memory addresses for the memory; and wherein the pluralcontiguous sets of memory addresses are each separated by a particularnumber of memory addresses that depends on the size of the output dataarray.
 20. The data processing system of claim 11, wherein the dataprocessing system is operable to cause at least some of the data and/ora processed version of at least some of the data to be displayed.
 21. Anon-transitory computer readable storage medium storing computersoftware code which when executing on a processor performs a method ofoperating a data processing system comprising: producing data in theform of blocks of data, where each block of data represents a particularregion of an output data array; storing the data in a memory of the dataprocessing system; and reading the data from the memory in the form oflines; wherein storing the data in the memory comprises: storing eachblock of data of a first row of blocks of data in the memory at one ormore memory addresses of a first set of memory addresses of a sequenceof memory addresses for the memory; and storing each block of data of asecond row of blocks of data in the memory at one or more memoryaddresses of a second set of different memory addresses of the sequenceof memory addresses for the memory; wherein at least some of the memoryaddresses of the second set of memory addresses fall between memoryaddresses of the first set of memory addresses in the sequence of memoryaddresses for the memory.